Goto

Collaborating Authors

 detection strategy



Supplementary material for CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence 1 Dataset Documentations 1 1.1 Hosted URLs

Neural Information Processing Systems

This task assesses the LLMs' ability to evaluate the severity of This task tests the LLMs' capability to The dataset consists of 5 TSV files, each corresponding to a different task. "Prompt" column used to pose questions to the LLM. Most files also include a "GT" column that Do not distribute. of LLMs to understand and analyze various aspects of open-source CTI. The dataset includes URLs indicating the sources from which the data was collected. A permanent DOI identifier is associated with the dataset: DOI: AI4Sec (2024).


Backdoor defense, learnability and obfuscation

Christiano, Paul, Hilton, Jacob, Lecomte, Victor, Xu, Mark

arXiv.org Artificial Intelligence

We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. In this game, the attacker modifies a function to behave differently on a particular input known as the "trigger", while behaving the same almost everywhere else. The defender then attempts to detect the trigger at evaluation time. If the defender succeeds with high enough probability, then the function class is said to be defendable. The key constraint on the attacker that makes defense possible is that the attacker's strategy must work for a randomly-chosen trigger. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability. In the computationally unbounded setting, we use a voting algorithm of Hanneke et al. (2022) to show that defendability is essentially determined by the VC dimension of the function class, in much the same way as PAC learnability. In the computationally bounded setting, we use a similar argument to show that efficient PAC learnability implies efficient defendability, but not conversely. On the other hand, we use indistinguishability obfuscation to show that the class of polynomial size circuits is not efficiently defendable. Finally, we present polynomial size decision trees as a natural example for which defense is strictly easier than learning. Thus, we identify efficient defendability as a notable intermediate concept in between efficient learnability and obfuscation.


Handling Ontology Gaps in Semantic Parsing

Bacciu, Andrea, Damonte, Marco, Basaldella, Marco, Monti, Emilio

arXiv.org Artificial Intelligence

The majority of Neural Semantic Parsing (NSP) models are developed with the assumption that there are no concepts outside the ones such models can represent with their target symbols (closed-world assumption). This assumption leads to generate hallucinated outputs rather than admitting their lack of knowledge. Hallucinations can lead to wrong or potentially offensive responses to users. Hence, a mechanism to prevent this behavior is crucial to build trusted NSP-based Question Answering agents. To that end, we propose the Hallucination Simulation Framework (HSF), a general setting for stimulating and analyzing NSP model hallucinations. The framework can be applied to any NSP task with a closed-ontology. Using the proposed framework and KQA Pro as the benchmark dataset, we assess state-of-the-art techniques for hallucination detection. We then present a novel hallucination detection strategy that exploits the computational graph of the NSP model to detect the NSP hallucinations in the presence of ontology gaps, out-of-domain utterances, and to recognize NSP errors, improving the F1-Score respectively by ~21, ~24% and ~1%. This is the first work in closed-ontology NSP that addresses the problem of recognizing ontology gaps. We release our code and checkpoints at https://github.com/amazon-science/handling-ontology-gaps-in-semantic-parsing.


Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

Abdali, Sara, Anarfi, Richard, Barberan, CJ, He, Jia

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them, with a particular emphasis on identifying AI-generated text as the ultimate solution. Additionally, we assess the feasibility of detection from a theoretical perspective and propose novel research directions to address the current limitations in this domain.


Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

Clymer, Joshua, Juang, Caden, Field, Severin

arXiv.org Artificial Intelligence

Like a criminal under investigation, Large Language Models (LLMs) might pretend to be aligned while evaluated and misbehave when they have a good opportunity. Can current interpretability methods catch these 'alignment fakers?' To answer this question, we introduce a benchmark that consists of 324 pairs of LLMs fine-tuned to select actions in role-play scenarios. One model in each pair is consistently benign (aligned). The other model misbehaves in scenarios where it is unlikely to be caught (alignment faking). The task is to identify the alignment faking model using only inputs where the two models behave identically. We test five detection strategies, one of which identifies 98% of alignment-fakers.


Australian universities to return to 'pen and paper' exams after students caught using AI to write essays

#artificialintelligence

Australian universities have been forced to change the way they run exams and other assessments amid fears students are using emerging artificial intelligence software to write essays. Major institutions have added new rules which state that the use of AI is cheating, with some students already caught using the software. But one AI expert has warned universities are in an "arms race" they can never win. ChatGPT, which generates text on any subject in response to a prompt or query, was launched in November by OpenAI and has already been banned across all devices in New York's public schools due to concerns over its "negative impact on student learning" and potential for plagiarism. In London, one academic tested it against a 2022 exam question and said the AI's answer was "coherent, comprehensive and sticks to the points, something students often fail to do", adding he would have to "set a different kind of exam" or deprive students of internet access for future exams. In Australia, academics have cited concerns over ChatGPT and similar technology's ability to evade anti-plagiarism software while providing quick and credible academic writing.


A Simple Unified Framework for Anomaly Detection in Deep Reinforcement Learning

Zhang, Hongming, Sun, Ke, Xu, Bo, Kong, Linglong, Müller, Martin

arXiv.org Artificial Intelligence

Abnormal states in deep reinforcement learning~(RL) are states that are beyond the scope of an RL policy. Such states may make the RL system unsafe and impede its deployment in real scenarios. In this paper, we propose a simple yet effective anomaly detection framework for deep RL algorithms that simultaneously considers random, adversarial and out-of-distribution~(OOD) state outliers. In particular, we attain the class-conditional distributions for each action class under the Gaussian assumption, and rely on these distributions to discriminate between inliers and outliers based on Mahalanobis Distance~(MD) and Robust Mahalanobis Distance. We conduct extensive experiments on Atari games that verify the effectiveness of our detection strategies. To the best of our knowledge, we present the first in-detail study of statistical and adversarial anomaly detection in deep RL algorithms. This simple unified anomaly detection paves the way towards deploying safe RL systems in real-world applications.


An Innovative Attack Modelling and Attack Detection Approach for a Waiting Time-based Adaptive Traffic Signal Controller

Dasgupta, Sagar, Hollis, Courtland, Rahman, Mizanur, Atkison, Travis

arXiv.org Artificial Intelligence

However, the evolution of mainstream transportation systems towards a connected cyber infrastructure, such as connected traffic signal controllers, is increasing system vulnerability to potential cyber attack, allowing malicious actors (individuals, criminals, or terrorist organizations) to exploit security vulnerabilities of such transportation infrastructure (1)-(3). In the U.S., the number of cyberattacks on smart mobility systems has jumped significantly in recent years (4). As vehicles are moving towards connected and automated driving, and cities are focusing on creating a transportation cyber infrastructure that will transform legacy transportation infrastructure to connected, adaptable, and automated systems, the security problems will only increase and further compromise public safety (5). Many studies show that a cyber attack on connected vehicle-based (CV-based) traffic signal control algorithms can break down a traffic network by creating severe congestion (6-10). An adaptive traffic signal controller (ATSC) combined with a connected vehicle (CV) concept uses real-time vehicle trajectory data to regulate green time; this combination also has the ability to reduce intersection waiting time significantly and improve travel time in a signalized corridor (11). A CV-based ATSC can be manipulated in two ways: (i) gain access through vulnerabilities and exploit the ATSC; and (ii) produce abnormal behavior through the manipulation of inputs of vehicle-related data (9).


Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations

Badjatiya, Pinkesh, Gupta, Manish, Varma, Vasudeva

arXiv.org Artificial Intelligence

With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we aim at bias mitigation from unstructured text data. In this paper, we make two important contributions. First, we systematically design methods to quantify the bias for any model and propose algorithms for identifying the set of words which the model stereotypes. Second, we propose novel methods leveraging knowledge-based generalizations for bias-free learning. Knowledge-based generalization provides an effective way to encode knowledge because the abstraction they provide not only generalizes content but also facilitates retraction of information from the hate speech detection classifier, thereby reducing the imbalance. We experiment with multiple knowledge generalization policies and analyze their effect on general performance and in mitigating bias. Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset (WikiDetox) of size ~96k and a Twitter dataset of size ~24k, show that the use of knowledge-based generalizations results in better performance by forcing the classifier to learn from generalized content. Our methods utilize existing knowledge-bases and can easily be extended to other tasks